Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface as simple as possible #13

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Interface as simple as possible #13

wants to merge 9 commits into from

Conversation

gdalle
Copy link
Member

@gdalle gdalle commented Sep 28, 2023

Public service announcement if people look at this PR: it's probably not gonna get merged. I mostly made it as a trigger for discussion, like GraphInterface.jl by @CameronBieganek :)

Based on #9, here is a smaller PR that will hopefully be easier to review and merge as a starting point.
Here are the highlights:

  • required interface for AbstractEdge (4 functions): src, dst, weight, reverse
  • required interface for AbstractGraph (4 functions): is_directed, vertices, out_edges, in_edges
  • optional interface for AbstractGraph, which the users are free to implement any subset of (since everything has a slow fallback)
  • barebones implementations of Simple(Di)Graph and SimpleWeighted(Di)Graph as examples

If you want to review, the easiest starting point is the documentation preview at https://juliagraphs.org/GraphsBase.jl/previews/PR13

Related issues:

@codecov
Copy link

codecov bot commented Sep 28, 2023

Codecov Report

Merging #13 (c001ac9) into main (f43d51f) will increase coverage by 29.91%.
The diff coverage is 29.31%.

@@            Coverage Diff            @@
##           main      #13       +/-   ##
=========================================
+ Coverage      0   29.91%   +29.91%     
=========================================
  Files         0       13       +13     
  Lines         0      117      +117     
=========================================
+ Hits          0       35       +35     
- Misses        0       82       +82     
Files Coverage Δ
src/GraphsBase.jl 100.00% <ø> (ø)
src/SimpleGraphs/SimpleGraphs.jl 100.00% <100.00%> (ø)
src/SimpleWeightedGraphs/SimpleWeightedGraphs.jl 100.00% <100.00%> (ø)
src/interface/utils.jl 63.63% <63.63%> (ø)
src/SimpleGraphs/simplegraph.jl 0.00% <0.00%> (ø)
src/SimpleGraphs/simpledigraph.jl 0.00% <0.00%> (ø)
src/SimpleWeightedGraphs/simpleweightedgraph.jl 0.00% <0.00%> (ø)
src/SimpleGraphs/simpleedge.jl 0.00% <0.00%> (ø)
src/SimpleWeightedGraphs/simpleweightededge.jl 0.00% <0.00%> (ø)
src/SimpleWeightedGraphs/simpleweighteddigraph.jl 0.00% <0.00%> (ø)
... and 3 more


$(TYPEDFIELDS)
"""
mutable struct SimpleDiGraph{T<:Integer} <: AbstractGraph{T,SimpleEdge{T}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it mutable? I guess to allow in-place modification of ne?

Copy link
Member Author

@gdalle gdalle Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably but it's kinda sad, I'm curious if there's a better way

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ne::Base.RefValue{Int}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was my thought at first, and I think I suggested it somewhere but got corrected, if only I could find the issue...

then again, maybe it's time to start fresh!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in newer versions of Julia you can mark fadjlist and badjlist as const so it may not matter conceptually either way. I think I found using Base.RefValue was a bit slower in practice compared to mutating a field of a mutable.

return (SimpleEdge{T}(u, v) for v in g.adjlist[u])
end

GraphsBase.in_edges(g::SimpleGraph, v) = out_edges(g, v)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about reverse.(out_edges(g, v))?

Something I found slightly annoying in Graphs.jl was that SimpleEdge was sometimes treated as directed and sometimes not, which leads to weird corner cases when designing code that is generic between undirected and directed graphs. Ideally, I think an undirected graph should act as much like a directed graph that has edges in both directions between vertices that have edges. So from that perspective it may be helpful to have in_edges and out_edges output edges with reverse directions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16


GraphsBase.src(e::SimpleWeightedEdge) = e.src
GraphsBase.dst(e::SimpleWeightedEdge) = e.dst
GraphsBase.weight(e::SimpleWeightedEdge) = e.weight
Copy link

@mtfishman mtfishman Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have examples where the weight would be determined in some potentially non-trivial way from vertex metadata, so it doesn't seem so natural to store it on an edge.

What about allowing weight(g::AbstractGraph, e::AbstractEdge)? It could then have a fallback weight(e::AbstractEdge) in the cases when it can be determined just from an edge.

With the current design the only way to customize the weight function is by defining a new edge type, which doesn't seem so practical in general (i.e. I would want to customize by graph type or graph instance, say by storing a weight function as a field inside the graph type which I access when overloading weight).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#14

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the laconic answers, I'm just opening issues as I go to keep track of your ideas in a more discoverable location

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks. I'm sure these same issues/questions have been brought up before, sorry for rehashing things. It's definitely more helpful to look at a concrete code proposal rather than longer abstract discussions, so thanks for making this PR!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etiennedeg deserves more credit, my PR is a simplified, structured version of his

GraphsBase.src(e::SimpleWeightedEdge) = e.src
GraphsBase.dst(e::SimpleWeightedEdge) = e.dst
GraphsBase.weight(e::SimpleWeightedEdge) = e.weight
Base.reverse(e::SimpleWeightedEdge) = SimpleEdge(e.dst, e.src, e.weight)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this should be SimpleWeightedEdge(...)?

Comment on lines +23 to +27
function GraphsBase.in_edges(g::SimpleWeightedDiGraph, v)
A = g.weights
a = A[v, :]
return (SimpleWeightedEdge{T}(u, v, a[u]) for u in a.nzind)
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this points to a challenge of coupling metadata with edges.

What if your graph requires some non-trivial calculation to access metadata (I can definitely think of situations where that is the case)? Every time you ask for an edge, you would need to perform that computation, even though most graph functions don't need the metadata and probably just want the incident vertices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#14

@CameronBieganek
Copy link

The adjlist::Vector{Vector{<:Integer}} data structure in SimpleGraph seems incompatible with add_vertex!(g, v). In other words, I would expect to be able to do the following:

g = SimpleGraph{Int}()
add_vertex!(g, 1_000_000)

...but the vector of vectors data structure is not well-suited for that sort of usage.

In other words, in my opinion, the generic interface should guarantee that add_vertex!(g, v) will always succeed regardless of the value of v, as long as the type of v is compatible with the vertex type for g.

@gdalle
Copy link
Member Author

gdalle commented Oct 3, 2023

In my opinion, the generic interface should guarantee that add_vertex!(g, v) will always succeed regardless of the value of v, as long as the type of v is compatible with the vertex type for g.

That's not for the interface to impose, that's a matter of implementation. The interface just provides the method add_vertex!(g, v), but then the implementation is free to throw an error if v != nv(g) + 1.

SimpleGraph was designed with a contiguous vertex set 1:n in mind, and that's why it is so efficient. Of course it also comes with the limitation above.
If you want somewhere you can add arbitrary vertices, you can roll out your own implementation like VertexSafeGraphs.jl. The goal is to allow both in a single interface.

@CameronBieganek
Copy link

That's not for the interface to impose, that's a matter of implementation. The interface just provides the method add_vertex!(g, v), but then the implementation is free to throw an error if v != nv(g) + 1.

Julia interfaces provide more than just method signatures, they also provide concepts. Here is the concept that I have in mind for adding and removing vertices:

  • A graph consists of a set of vertices and a set of edges.
  • Operations such as add and rem should be as set-like as possible. In particular:
    • Let $V$ be the set of vertices in g.
    • Adding a vertex to g by using add_vertex!(g, v) has the following effect:
      • $V \leftarrow V \cup \{ v \}$
    • Removing a vertex from g by using rem_vertex!(g, v) has the following effect:
      • $V \leftarrow V \setminus \{ v \}$

In my opinion, in the new interface this concept should apply to all subtypes of AbstractGraph. SimpleGraph does not get an exemption.

For clarity, here's an example that shows what I expect to happen:

# vertices(g) == [1, 2, 3, 4, 5]
rem_vertex!(g, 3)   # now `vertices(g) == [1, 2, 4, 5]
add_vertex!(g, 42)  # now `vertices(g) == [1, 2, 4, 5, 42]

@mtfishman
Copy link

@CameronBieganek I don't see why adding any vertex should be allowed for any graph type, since that won't even be possible in many cases (say adding a string vertex to a graph that only supports integer vertices). I think it's reasonable to throw an error if you try to add a vertex that isn't compatible with the data structure (or isn't fast to add), as @gdalle suggested above.

@CameronBieganek
Copy link

I don't see why adding any vertex should be allowed for any graph type

That was just an imprecision in my language above. I mean any vertex of a compatible type. The precise definition is contained in my docstring for GraphInterface.add_vertex!:

"""
    add_vertex!(g::AbstractGraph, v)

If the vertex `v` is not in the graph `g`, then add it to `g`, otherwise do nothing. This is
guaranteed to succeed as long as the following is true:

```julia
isequal(convert(vertex_type(g), v), v)
```

Thus, `v` must be convertible to the vertex type of graph `g` *and* the result of the conversion
must be `isequal` to `v`. Thus, if `g` has vertex type `Int`, then `add_vertex!(g, 'a')` will
fail, because `isequal(97, 'a')` returns `false`.

Returns the graph `g` (after adding `v`).
"""

So, you can add an Int8 vertex to a graph with vertices of type Int64, but you cannot add a Char vertex to that graph. Note that Base.Set and Base.Dict do the same thing:

julia> push!(Set(1), 'a')
ERROR: ArgumentError: 'a' is not a valid key for type Int64
Stacktrace:
 [1] setindex!(h::Dict{Int64, Nothing}, v0::Nothing, key0::Char)
   @ Base .\dict.jl:363
 [2] push!(s::Set{Int64}, x::Char)
   @ Base .\set.jl:103
 [3] top-level scope
   @ REPL[1]:1

@mtfishman
Copy link

mtfishman commented Oct 3, 2023

I still don't see why we wouldn't allow graph types (like SimpleGraph) where you can only add a vertex add_vertex!(g, v) such that v == nv(g) + 1, what problem does that cause? I could see an argument that it is hard to make code generic between graphs with such constrained vertex label requirements and more general ones, but that seems to be the core of the challenge of generalizing Graphs.jl. Seems like we should try to solve that before disallowing it entirely, since that constraint could make graph functions faster, which is why (from what I understand) it was chosen for the core graph type of Graphs.jl in the first place.

@CameronBieganek
Copy link

The behavior of add_vertex!(g, v) on SimpleGraph is unfortunate, but the real nail in the coffin for SimpleGraph is the behavior of rem_vertex!(g, v). Suppose I work for Facebook and on Monday I create a graph that represents a social network among 10 individuals, labeled from 1 to 10. I want to find the distance between 4 and 9, so I run shortest_path(g, 4, 9). Then on Tuesday, person 2 deletes their account, so I run rem_vertex!(g, 2). I still want to know the distance between 4 and 9, so I run shortest_path(g, 4, 9) again. But that will give me the wrong answer, because the vertex codes from 2 to 10 have all changed! And in fact, I won't know the manner in which they've changed unless I read the source code of rem_vertex!(g::SimpleGraph, v). That is unacceptable behavior for a generic graph interface.

...I know that Graphs.rem_vertex! has a warning that says "This operation has to be performed carefully [...] since internally the removal is performed by swapping the vertices v and |V|, and removing the last vertex |V| from the graph." However, implementation details like that should not leak out into a generic interface.

@mtfishman
Copy link

Agreed that the behavior of rem_vertex! is more problematic. I guess I have some optimism that supporting both SimpleGraph-like graphs (let's call them linear vertex graphs) and also generic vertex graphs can be salvaged with good generic code and maybe some traits or data structure specialization (like create_vertex_container brought up in 1) to dispatch on functions that assume linear vertices (for the sake of performance) vs. generic vertices. Not all graph functions use rem_vertex!, so it wouldn't be problematic to toss in a SimpleGraph.

In fact, I searched for rem_vertex! and rem_vertices! in the Graphs.jl repository and get no hits in any graph algorithms:

so I don't see why supporting them would be an issue. It seems like if a user doesn't like the behavior of rem_vertex! then they should use a different graph type.

Footnotes

  1. https://discourse.julialang.org/t/rfc-graphinterface-jl-an-interface-proposal-for-graphs-jl-version-2-0/104518/12

@mtfishman
Copy link

mtfishman commented Oct 3, 2023

I appreciate the sentiment that functions should have contracts, and it could make sense that rem_vertex! strictly follows the rule that rem_vertex!(g, v) returns a graph such that v ∉ vertices(g) and all other vertices are preserved. What if, for SimpleGraph-like graphs, there is a restriction that:

  • add_vertex!(g, v) only works if v == nv(g) + 1.
  • rem_vertex!(g, v) only works if v == nv(g).

Then, there could be some internal function for removing the vertices of a SimpleGraph that is used by graph functions that are ok with not preserving the vertices, and the vertices are just tracked internally as needed (for example rem_vertex_and_shift! or something like that). If users want to add or remove other vertices they can use a different graph type, or use those different functions knowing they will get relabeled vertices.

@mtfishman
Copy link

Here is a proposal for generic linear indexing of graphs.

For graphs with linear vertices (like SimpleGraph):

add_vertex!(g, v) # Only works if `v == nv(g) + 1`
rem_vertex!(g, v) # Only works if `v == nv(g)`

and for more general graphs, there would be fewer constraints on v (say just constrained by the vertex type, but that's up to the graph type to decide).

To add or remove arbitrary vertices of a SimpleGraph (which would involve shifting the vertex labels):

insert_vertex!(g, v) # Analogous to `insert!` for inserting an element into a `Vector` in Base Julia
add_vertex!(g, v * th) # Syntax from https://github.com/jishnub/OrdinalIndexing.jl for linear indexing

delete_vertex!(g, v) # Analogous to `deleteat!` for deleting an element of a `Vector` in Base Julia
rem_vertex!(g, v * th) # Syntax from https://github.com/jishnub/OrdinalIndexing.jl for linear indexing

These could generalize to a graph with general vertices that has an underlying data structure that has linear vertices (say a graph with arbitrary vertices wrapping a SimpleGraph), in which case the vertex v above corresponds to the linear parent vertex and directly accesses the parent graph. An advantage of the OrdinalIndexing.jl-style syntax is that it generalizes to other operations, such as neighbors(g, v * th) (of course it could be some other indexing object like neighbors(g, ParentVertex(v)) or neighbors(g, LinearVertex(v)), that's not so important).

Comment on lines +26 to +27
nv
ne

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wish these horrible names would be removed or at least deprecated...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvertices and nedges?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would at least fit with ndigits, nfields, ndims.

(Personally I've come to loath all these abbreviations -- they are super nice if you bang away on experiments in the REPL. But write code with them and come back after a year and try to read it, and it looks like gobbledygook. Esp. if you have to context switch between multiple languages and libraries... So I am now tending into the more "radical" direction of preferring names like num_vertices or even number_of_vertices. But I realize I am probably in the minority on this, which is fine, but I figure I should at least mention it once before staying silent forever after ;-). Anyhow, thank you for working on this ).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, nvertices and nedges seems decent.

@gdalle gdalle marked this pull request as draft October 30, 2023 17:06
@filchristou
Copy link

Haven't read everything, but I think the following was still not raised as a point: Given that MetaGraphs will be integrated in GraphBase isn't it a bit a redundant to also have WeightedGraph implemented differently ? I mean a weighted graph is nothing else but a MetaGraph with a primitive data type as edge data.
Having it as a MetaGraph might also be more flexible since composite types can be treated as weights ( if an interface is followed ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants